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Summary 


The SARS related Coronavirus genome contains a variety of novel accessory genes. One of these, called 
ORF7a or ORF8, code for a protein, known as 7a, U122 or X4. We set out to determine the three- 
dimensional structure of the soluble ectodomain of this type-I transmembrane protein by nuclear magnetic 
resonance spectroscopy. The fold of the protein is the first member of a further variation of the immu- 
noglobulin like beta-sandwich fold. Because X4 does not reveal significant sequence homologies to proteins 
in the data bases, we carried out a structure based similarity search for proteins with known function. High 
structural similarity to DI] domains of ICAM-1 and ICAM-2, and common features in amino acid sequence 
between X4 and ICAM-1, suggest X4 to possess binding activity for the a, integrin I domain of LFA-1. 
Further, based on this structure based prediction, potential functions of X4 in virus replication and 
pathogenesis are discussed. 


membrane (M), envelope (E), and nucleocapsid 
(N) proteins. In addition, SARS-CoV codes for 
subgroup-specific accessory proteins that are 
thought to be dispensable for viral replication in 
cell culture, but may be important for virus—host 
interactions and thus contribute to the virus’ 
fitness. The important roles of these so-called 
“accessory” proteins for viral infectivity, replica- 


Introduction 


A novel coronavirus (CoV) has been shown to be 
the etiologic agent of the severe acute respiratory 
syndrome (SARS) epidemic, which affected about 
30 countries in late 2002. The viral genome is 
almost 30 kb in length and contains at least 11 
open reading frames, whereas the exact number 


depends on the strain and the minimal count of 
coded amino acid residues [1—3]. Coronaviruses 
are positive-strand RNA viruses that code for the 
characteristic proteins replicase (R), spike (S), 
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tion efficiency and pathogenic effects is well 
established and investigated e.g. for the human 
immunodeficiency virus (HIV) accessory proteins 
[4, 5], and for Rhinoviruses [6]. 

Most SARS-CoV accessory proteins do not 
reveal significant sequence homologies to proteins 
with known function. Thus, a possible approach to 
elucidate potential functions of these proteins may 
be to determine their three-dimensional structures 
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and search for structural similarities to proteins 
with known functions. 

As an example of such an approach to SARS- 
CoV accessory proteins, we and others [7] focused 
on the X4 protein [1], also called U122 [7], coded 
by a gene with the names ORF7a [8] and ORF8 
[2]. ORF8 was suggested to encode a predicted 
protein of 122 amino acids that has no significant 
BLAST or FASTA matches to known proteins [2]. 
X4 is predicted to contain a transmembrane helix 
comprising residues 99-117. The predicted signal 
sequence 1s probably cleaved off between residues 
15 and 16. Together these data indicate that X4 1s 
likely to be a type I membrane protein, with the 
amino-terminal hydrophilic domain (residues 16— 
98) oriented inside the lumen of the ER/Golgi or 
on the surface of the cell membrane or virus 
particle, depending on the localization of the 
protein. 

Recently, it was shown that X4 is expressed in 
SARS-CoV-infected cells [7, 9]. In addition, the 
signal peptide is cleaved in Vero E6 cells trans- 
fected with ORF 7a. The carboxy-terminus 
(KRKTE) represents a functional ER retrieval 
motif. Experiments to investigate the subcellular 
localization in Vero E6 cells show that X4 is 
indeed present in the ER compartment, probably 
in the ER/Golgi intermediate compartment [9], the 
trans-Golgi network and also in small amounts on 
the cell surface of infected cells [7]. 

It has been shown that an additional SARS 
CoV accessory protein, namely, X1 (U274, ORF3 
or ORF3a), is expressed on the cell surface of 
SARS-CoV-infected cells [10, 11] and interacts 
with X4. In addition, X1 has been shown to 
interact with the SARS-CoV structural proteins E, 
S, and M [10]. 

Another study on X4 came to the conclusion, 
that X4 is able to induce apoptosis via the 
caspase pathway in various cell types [12]. This 
was speculated to be one of the underlying 
mechanisms for the pathogenesis of SARS-CoV 
infections. 

These interesting findings and the lack of 
sequence homology to proteins with known 
function prompted us to determine the three- 
dimensional solution structure of the X4 ectodo- 
main containing residues 16-99 by _ nuclear 
magnetic resonance (NMR) spectroscopy to derive 
hints for potential functions of X4. 


Materials and methods 
Cloning of X4e 


A piece of DNA containing codons 16—99 and an 
additional carboxy-terminal arginine was obtained 
by a polymerase chain reaction (PCR) with six 
different synthetic oligonucleotides with sequences 
based on the published amino acid sequence of X4 
(Swiss-Prot accession number: P59635) [1, 2]. The 
oligonucleotide (primer) sequences were adapted 
for optimal codon usage of highly expressed 
proteins in £. coli: primer A, CGGAATTCA 
TATGCTGGAAGTTCTGTTCCAGGGGCCC; 
primer B, GPTCTGTTCCAGGGGCCCGAAC 
TGTACCACTACCAGGAATGCGTGCGTGG 
TACCACCGTGCTGCTGAAAGAACCG; | pri- 
mer C, CGTGCTGCTGAAAGAACCGTGCCC 
GAGCGGTACCTACGAAGGTAACAGCCCG 
TTCCACCCGCTGGCGGATAACA, primer D, 
GTACCATCCGCGCACGCGAACGCGAAGT 
GGGTGCTGGTGCAGGTCAGCGCGAATTT 
GTTATCCGCCAGCGGGTGG; primer E, CGG 
ATGAACAGTTTCGGGCTCACGCTACGCG 
ACCGCAGCTGGTAGGTGTGACGGGTAC 
CATCCGCGCACGCGA; primer F, CCGCTC 
GAGGGATCCTTAACGGCTGTACAGTTCC 
TGCTGCACTTCTTCCTGACGGATGAACA 
GTTTCGGGC. Primers A and F contained Ndel 
and BamHT restriction sites, respectively, for inser- 
tion of the finally obtained PCR fragment into the 
multiple cloning site of pET15b. PCR _ was 
performed with Vent-polymerase (New England 
Biolabs). Three subsequent PCR-reactions were 
performed in 50 yl volumes: the first with primers C 
and D for 10 cycles, the second with | yl asa 
template taken from the first reaction and primers B 
and E for 10 cycles, and finally the third reaction 
with | yl as a template taken from the second 
reaction and primers A and F for 30 cycles. 
Annealing temperature for the three reactions were 
62, 60, and 58 °C, respectively. The PCR product 
obtained thereafter, was cleaved with NdeI and 
BamH 1 and inserted into the multiple cloning site 
of pET15b yielding plasmid pX4e that codes a 
poly-histidine stretch followed by thrombin and 
PreScission cleavage sites (MGSSHHHHHHS 
SGLVPRGSHMLEVLFQGP), followed by resi- 
dues 16-99 of X4 and a C-terminal arginine and a 
stop codon. The DNA sequence of the gene coding 


the poly-histidine tagged X4e protein was con- 
firmed by DNA sequencing. 


Expression and purification of the X4 ectodomain 


Vector pX4e was transformed in E. coli BL21 
(DE3) Rosetta cells (Stratagene). The transformed 
cells were grown at 37 °C to 2 ml of Luria broth 
(LB) medium plus ampicillin (100 ug/ml) for S5— 
7h. The cells were transferred into 50 ml LB- 
Medium and grown until they reached an OD¢ 9 
of about 2, added to 11 LB and grown until they 
reached an ODeo9 of 0.7. Induction of X4e 
expression was induced with isopropyl /-pb-galac- 
topyranoside (IPTG) at a final concentration of 
1 mM. After 7-9 h the cells were harvested and 
frozen at —20 °C. To obtain isotope labelled X4e 
protein, minimal medium [13] containing | g/l 
'"N-ammonium chloride, 2 g/l '*C-glucose, and a 
vitamin cocktail (5 mg/l thiamine, | mg/l biotin, 
1 mg/l choline chloride, 1 mg/l folic acid, | mg/l 
niacinamide, | mg/l pantothenic acid, 1 mg/l pyr- 
idoxal hydrochloride, 0.1 mg/l riboflavin) was 
used instead of LB. 

The cell pellet of 11 expression culture were 
resuspended in 50 ml buffer B (6 M GdmHCl, 
20 mM tris-HCl, 500 mM NaCl, 5 mM imidazol, 
1 mM 2-mercaptoethanol, pH 8.0). Cell lysis was 
carried out at room temperature for about 2 h. 
After centrifugation the supernatant was filtrated 
(0.44 wm) and added to a 5 ml nickel-loaded 
HiTrap chelating HP column (Amersham). The 
column was washed with 150 ml of buffer B, 
100 ml of buffer WI (6 M GdmCl, 20 mM Tris— 
HCl, 500 mM NaCl, 20 mM Imidazol, 1 mM 2- 
mercaptoethanol, pH 8.0), 10 ml buffer W2 (buffer 
WI with additional 10 mM imidazol) and 10 ml 
buffer W3 (buffer WI with additional 20 mM 
imidazol). X4e protein was eluted with 20 ml 
buffer E (6M GdmCl, 20mM_ Tris-HCl, 
500 mM NaCl, 500 mM Imidazol, 1 mM 2-mer- 
captoethanol, pH 8.0). 

Poly-histidine tagged X4e protein was dialyzed 
against buffer A (6 M GdmCl, 10 mM Tris-HCl, 
100 mM_ Sodium phosphate, 1mM _ EDTA, 
10 mM 2-mercaptoethanol, pH 8.0), and subse- 
quently against buffer D (10 mM Tris-HCl, 1 mM 
EDTA, 100 mM Sodium phosphate, pH 8). The 
denaturated moiety of the protein was removed by 
centrifugation and the supernatant was dialyzed 
against 10 mM _ sodium acetate pH 5, and 
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subsequently diluted 1:4 with buffer F (20 mM 
Tris/HCl pH 7, 1 mM EDTA). The poly-histidine 
tag was cleaved off with 100 wg PreScission 
(Amersham) per mg X4e at 4 °C for about 5 h. 

The protease was removed by loading the 
mixture on a GST-column (2 ml resin; Amersham) 
equilibrated in PBS and subsequent washing with 
6 ml buffer P 10 mM Tris/HCl, 100 mM sodium 
phosphate, pH 6). The protein was found in the 
flow path as well as in the fraction of buffer P. To 
remove the poly-histidine tag and uncleaved X4e 
protein, both fractions were loaded on 2 ml nickel- 
loaded nitrilotriacetic acid (Ni-NTA) agarose resin 
(Qiagen) equilibrated in buffer P. The column was 
washed with 20 ml buffer P. The flow path and the 
protein containing fraction from buffer P were 
combined and dialyzed against |1mM Sodium 
acetate pH 5. 

The pure protein was analyzed by SDS-poly- 
acrylamide gel electrophoresis. Protein concentra- 
tion was determined by absorption at 280 nm 
using the molar extinction coefficient of X4e of 
6640 M7! cm”. 


NMR spectroscopy 


NMR samples contained 0.4 mM protein in 1 mM 
°—D, sodium acetate, pH 5.0, in 93% H,O/7% 
D,O. NMR spectra were recorded at 315 K on 
Varian “™“INOVA spectrometers equipped with 
triple-axis pulse-field-gradient (PFG) triple reso- 
nance probes and cryogenically cooled Z-axis 
PFG triple resonance probes at proton frequencies 
of 600 and 800 MHz. Uniformly '°C/'°N-labelled 
protein was used for all experiments. The reso- 
nance assignment of X4e was obtained using the 
following experiments: 'H—'’N-HSQC, 'H-'°C- 
HSQCs, HNCACB, C(CO)NH, HNCO, HNHA, 
HCCH-COSY. Aromatic side chain resonances 
were assigned through (HB)CB(CGCD/CE)HD/ 
HE and !*C-edited HSQC-NOESY experiments. 
Structural constraints were derived from '°N- 
edited NOESY-HSQC (120 ms mixing time) and 
aliphatic '*C-edited HSQC-NOESY (100 ms) 
experiments in the described buffer. 

NMR Spectra for steady-state 'H—-'"N NOE 
and T, relaxation time measurements [14] were 
recorded on '°N-labelled protein under the same 
conditions as the other NMR _ experiments. 
'H-'°N NOE spectra were collected for sensitivity 
reasons at 800 MHz on a Varian Unity INOVA 
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spectrometer equipped with cryogenically cooled 
Z-axis PFG triple resonance probe. Spectra 
recorded with proton saturation utilized a 1s 
recycle delay followed by a 3s period of satura- 
tion, while spectra recorded in the absence of 
saturation employed a recycle delay of 4s. T> 
experiments were recorded at 600 MHz on a 
Varian Unity INOVA spectrometer with a con- 
ventional probe due to concerns about heating 
during the CPMG period on the cryogenically 
cooled probe at 800 MHz. T> spectra were 
recorded with a recycle delays of 3s. Values of 
the steady-state 'H—'°N NOE were obtained from 
the ratio of peak intensities of spectra recorded 
with and without proton saturation. Values of T> 
were determined by fitting the measured peak 
volumes to a single exponential decay curve. 


Data evaluation and structure calculation 


Based on the almost complete assignment of 'H, 
'5C, and '°N resonances of X4e, a total of 1688 
NOE-derived experimental constraints (including 
623 long-range distance constraints) could be 
derived from three-dimensional NOESY spectra 
in an iterative procedure (Table 1). NOE analysis 
and assignment was performed using CARA [15] 
and ARIA [16]. Interproton distances were used 
directly to calibrate experimental peaks and to 
extract distance constraints. Lower and upper 
bounds for distance constraints were derived from 
the target distances empirically by estimation of 
the error as 12.5% of the target distance squared. 
Distances involving ambiguous _ constraints, 
methyl groups, aromatic ring protons and non- 
stereospecifically assigned methylene protons were 
treated as sum of separate contributions to the 
target function, known as “‘sum averaging” [17]. 
No hydrogen bonds or predetermined secondary 
structure elements were used as input. 

Final structures were calculated with ARIA 
using CNS without automatic assignment. The 15 
lowest-energy structures out of 100 calculated 
structures were further refined by a short simu- 
lated annealing run in an explicit water shell. Of 
those, the 10 lowest energy structures that did not 
show any distance constraint violation of more 
than 0.03 nm were used for further analysis. 
Geometry of the structures, structural parameters 
and secondary structure elements were analyzed 
and visualized using the programs MOLMOL [18], 


PROCHECK [19] and WHATIF [20]. The coor- 
dinates have been deposited in the Protein Data 
Bank, with accession code 1YO4. 


Results and discussion 


Protein expression, purification and structure 
determination 


'"N-'°C-double-labelled X4 ectodomain (X4e) 
was prepared as described in the methods part. 
In the following, numbering of the amino acid 
residues of X4e is according to the residue’s 
position in the mature protein, e.g. amino acid 
residue | of X4 is coded by codon number 16 of 
the ORF7a gene. To prepare a sample for NMR 
spectroscopy, 1.3 mg protein were used to obtain a 
400 uM protein solution in 1 mM sodium acetate 
buffer, pH 5. Virtually all proton, '°N, and !°C 
resonances were assigned (Figure 1) and deposited 
in the BMRB data bank (accession code: 6824). 
Experimental NOE distance constraints were col- 
lected from a '*C-HSQC-NOESY spectrum and in 
addition from a NOESY-—'°N-HSQC spectrum 
recorded with a '°N-labelled sample. Altogether 
1974 constraints were used for the final structure 
calculation using a simulated annealing protocol. 
The finally obtained structure family consisted 
often structures that fulfilled the experimental 
distance constraints with a maximum deviation 
of less than 0.03 nm (Figure 2a). The structures 
were deposited in the PDB data bank (accession 
code: 1YO4). 


Structure description and fold classification 


Residues 1—65 of the X4 ectodomain form a well 
defined beta-sandwich fold. Residues 66—84 ap- 
pear to be unstructured, indicated by decreased 
heteronuclear 'H—'°N-NOE values (Figure 3) and 
the lack of experimental NOE-derived structural 
data for this part of the protein. The well 
structured part of X4e is built up from seven 
beta-strands so that four strands form one beta- 
sheet and three strands form a second sheet 
(Figure 2b). The sheets are closely packed or 
‘“sandwiched”’ against each other. Each sheet 1s 
amphipathic with the hydrophobic side facing 
inward. The larger four-stranded _ beta-sheet 
consists of strands A, G, F, and C, the smaller 
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Table 1. Constraints and structural statistics for the resulting 10 NMR structures of X4. 


Number of experimental restraints 
Intra-residue unambiguous NOEs 
Sequential unambiguous NOEs 
Medium-range unambiguous NOEs 
Long-range unambiguous NOEs 
Total unambiguous NOEs 
Total ambiguous NOEs 

RMSD (nm) from the mean (Residues I-65) 
All backbone atoms 
All heavy atoms 
Secondary structure backbone atoms 
Secondary structure heavy atoms 

Non-bonded energy values after water-refinement (kcal/mol) 
van der Waals 
electrostatic 

RMSD from idealized covalent geometry 
Bonds (nm) 

Angles (°) 
Impropers (°) 

RMSD from experimental data 
Distance (nm) 

Number of restraint violations 
Distance (> 0.03 nm) 

Distance (>0.01 nm) 

Ramachandran analysis (Residues I-65) 
Residues in most favoured regions (%) 
Residues in additional allowed regions (%) 
Residues in generously allowed regions (“%) 
Residues in disallowed regions (%)* 


Structural Statistics of the ensemble of X4 structures. 


703 
417 
108 
477 
1705 
269 


0.055 +0.010 
0.095 + 0.009 
0.037 + 0.007 
0.081 + 0.009 


—707+ 15 
—3286 + 97 


0.00044 + 0.00002 
0.52+0.03 
1.46+0.15 


0.0021 + 0.0002 


0 
26.9+4.3 


73:1 
24.5 
0.9 
les) 


“In all structures, ser22 was found in a disallowed region. This may be due to fast local conformational exchange. This may lead to 
different local conformations that cannot be distinguished on the NMR time scale, which ultimately yields experimental NMR 
constraints that are in accordance only with a time-averaged structure rather than one of the limit conformations. Although the limit 
comformations may well be within either one of the allowed regions of the Ramachandran plot, the average structure does not. 


three-stranded beta-sheet consists of strands B, E, 
and D. All beta-strands align in anti-parallel 
fashion, as it 1s well-known for most immuno- 
globulin-like domains, with the exception of strand 
A, which aligns parallel to strand G. 

Two disulfide bonds link both sheets on oppo- 
site edges (Figure 2b) therefore stabilizing the 
beta-sandwich structure. At the top of the struc- 
ture, defined by the BC, DE and FG loops [22], a 
disulfide bridge between Cys20 (BC loop) and 
Cys54 (end of the F strand) creates a compact tip 
in the structure. At the bottom, which is defined by 
the AB, CD and EF loops, a disulfide bridge links 


Cys8 at the end of strand A with Cys43 at the end 
of strand E. 

Not surprisingly, the solution structure of 
residues 1-65 of X4e obtained in the present study 
is very similar to the X-ray structure reported very 
recently [7] (Figure 2a). The overall backbone root 
mean squared (rmsd) between both structures is 
0.11 nm. 

The beta-sandwich domain measures approx. 
3.5x2.7x2.0 nm, in which the longest distance 
corresponds to the top-bottom distance. The 
AGEFC beta-sheet extends along the full height of 
the structure, whereas the BED beta-sheet 1s 
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Figure 1. 'H, '*N-HSQC spectrum of SARS-CoV X4 ectodomain. Resonances are labelled according to the respective residue’s 
position in the mature protein, e.g. residue 1 of X4 is coded by codon number 16 of the ORF7a gene. The asterisk indicates the 
amide resonance position of ser22, which is hardly visible, probably due to local conformational exchange. 


significantly smaller and located rather to the 
bottom of the AGFC beta-sheet. 

The overall appearance of the seven stranded 
beta-sandwich fold comprising residues 1-65 of 
X4e very much reminds one to an immunoglobulin 
(Ig) like fold. So far, several types of Ig like folds 
are described [23]. The prototype fold is called c- 
type with strands A, B, E, and D forming one 
sheet and strands C, F, and G forming the other 
sheet (Figure 4). A variant, called s-type has a 
topology with strand D being switched from the 
ABED sheet to the CFG sheet. This strand 1s then 
called C. X4e is a variation of the c-type with the A 
strand being attached to the CFG sheet. Although, 
it was already suggested more than 10 years ago to 
introduce a new subtype for such a topology 
within the Ig like folds [23], X4e is to our 
knowledge the first pure member of that subtype 


of Ig-like folds. We suggest the name “‘p-type”’ in 
accordance with the parallel orientation of A and 
G strands. 

A partial switch of strand A from one sheet 
(BED) to the other (CFG) is known from some 
proteins, where strand A is subdivided into A and 
A’ with each part of the strand being attached to 
the BED and CFG sheets, respectively. Examples 
are the Dl domains of intercellular adhesion 
molecules ICAM-1 [24] and ICAM-2 [25]. DI and 
D2 domains of the IL-1 receptor exhibit a com- 
pletely switched strand A. Both domains contain, 
in contrast to X4e, short C’ strands and were 
classified as “‘c-type” [26]. In addition, numerous 
““v-type’ Ig-like folds are known with a nine- 
stranded beta-sandwich (ABED and GFCC’C” 
where the A strand is completely switched from 
the BED sheet to the GFCC’C” sheet (Figure 4). 
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(a) (b) 


Figure 2. Solution structure of SARS-CoV X4 ectodomain after simulated annealing and refinement calculations. (a) Shown is a 
superposition of the protein backbones of all 10 obtained structures (fine lines) with the recently published X-ray crystal structure 
IXAK (backbone worm representation, 1XAK). (b) Ribbon representation of the lowest-energy solution structure of the SARS 
CoV X4 ectodomain. Secondary structural elements are accentuated and labelled according to their sequential arrangement. Heavy 
atom sidechains of Cysteines 8, 20, 43 and 52 are shown in yellow to visualize the disulfide bridges. (c) Surface contours and 
charge distribution of X4 solution structure. The surface of the X4 lowest energy structure is coloured according the electrostatic 
potential computed and visualized by the DelPhi module of the Accelrys Insight IT molecular modelling system. Regions of basic 
potential are shown in blue; acidic regions are in red. Surface exposed amino acid residues of particular interest are labelled. Please 
note, that numbering of the amino acid residues of X4e is according to the residue’s position in the mature protein, e.g. amino 
acid residue 1 of X4 is coded by codon number 16 of the ORF7a gene. Figures were prepared and secondary structure elements 
identified using MOLMOL [18, 21]. 


At the top of the BED sheet, the DE loop is 
protruding from the structure and together with 
beta-strands C and D delineates a groove on the 
surface of X4 (Figure 2c). Central to the groove is 
the sidechain-carboxyl group of residue Glu 18, 
contributing a negative charge to the bottom of 
the otherwise hydrophobic groove. Such a hydro- 
phobic patch on the molecule surface with a 
central negative electrostatic potential may form a 
potential site for ligand interaction. 

The very well structured Ig-like domain of X4e 
comprises all residues from Glul to Arg65. The 
putative membrane-spanning segment of full- 
length X4 starts with residue Gln80. Residues 
Ser66 to Gln79 are flexible and unstructured in the 
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Figure 3. NMR dynamic data of SARS-CoV X4 ectodomain. 


Plots of '°N T> and steady-state 'H-'°N NOE recorded at 
42 °C. Steady-state 'H-'°N NOE values (black circles) were 
determined from data collected at 800 MHz. T> relaxation 
values (hollow circles) were determined from data collected at 
600 MHz. Both '°N T> and steady-state 'H—-'°N NOE values 
support the hypothesis of an unstructured carboxy-terminal 
region encompassing residues 66-84 in contrast to the well- 
folded f-sandwich comprising residues 1-65. Neither 'H—-'°N 
NOE nor !°N T; values exhibit any special features within the 
f-sandwich fold. 


solution structure, which is in accordance with the 
crystal structure. Although these residues might 
possibly form a more defined structure in the 
presence of a lipid bilayer, a potential role of the 
flexible part may be to allow the Ig-like domain to 
bind to membrane-distant epitopes of binding 
partners, or it may be a target for extracellular 
proteases to allow shedding of X4e. 


c-type 


Figure 4. 2D topology diagrams of observed beta-sheet for- 
mation in various Ig-like folds based on the topology subtyp- 
ing of Bork and co-workers [23]. Strands are labelled in 
alphabetical order from N- to C-terminus. Affiliation of each 
strand to either one of the two sheets is indicated by grey or 
black filling. Please note, that not all subtypes are shown. 


Structure based prediction of X4 function 


As was already found by others [2], X4 does not 
show any significant sequence homology to other 
proteins in the data bases. Therefore, no obvious 
function of X4 could be derived from any sequence 
similarity to proteins with known function. Struc- 
ture based predictions of functions on the basis of 
similarities to proteins with known functions have 
been successfully used in the past [45-48], and such 
kind of approach is a major driving force for 
structural genomics projects. 

In order to identify potential functions of X4 
we used the solution structure to search for 


proteins with similar three-dimensional structures 
applying the tools DALI [27] and VAST 
(www.ncbi.nlm.nih.gov/Structure/VAST/vast- 
search. html). 

Employing DALI, we compared the solution 
structure of X4e with known structures from the 
Protein Data Bank (PDB). The best hit with a high 
reliability indicating Z-score of 4.8 is the Dl 
domain of ICAM-2. The structures of ICAM-2 
and X4e can be aligned with each other yielding an 
rmsd of 0.20 nm based on 59 C ,-carbon coordi- 
nate pairs (Figure 5). Another interesting hit 
appeared to be the D1 domain of the IL-1 receptor 
with a Z-score of 3.9. Both structural similarities 
of X4e were already found using the X4e crystal 
structure and DALI [7]. 

Using the X4e solution structure for a VAST 
search revealed the DI domain of ICAM-1 to be 
most similar to X4e with an rmsd of 0.18 nm based 
on 59 C ,-carbon coordinate pairs (Figure 5). 
ICAM-1 is very similar to ICAM-2 in respect to 
structure and amino acid sequence. 


Comparison of X4e with ICAM-1 and ICAM-2 DI! 
domains 


ICAM-1 and ICAM-2 are cell adhesion molecules 
expressed on the surface of cells, especially on 
endothelial cells after cytokine-mediated stimula- 
tion at inflammatory sites [28]. ICAMs belong to a 
subset of Ig-like superfamily proteins, which are 
specialized for binding to integrins. Integrins in 
general are non-covalently associated «/f hetero- 
dimeric transmembrane proteins, which are in- 
volved in adhesive cell—cell-interactions. ICAM-1 
and ICAM-2 are known to specifically interact 
with lymphocyte-function-associated antigen 1 
(LFA-I, CDIla/CD18, oa, f>-integrin) that 1s 
expressed mainly on lymphocytes. ICAM LFA-1 
interactions play a crucial role in lymphocyte 
attachment and homing to inflammation sites [29]. 
The ICAM-1 binding site on LFA-1 1s the 180 
residue containing I-domain of the a, subunit and 
the binding interface is well described [30]. 

X4 aligns structurally very well with Dl 
domains of ICAM-1 and ICAM-2 (Figure 5). 
The most obvious differences in the structural 
alignment are the shortened f-strands and the 
corresponding BC, DE and FG loops at the top of 
the structure. The only topological difference 
between both ICAM DI domains and X4e 1s 


289 
(a) (b) (c) 


Figure 5. Structural similarity of the SARS CoV X4 ectodomain to Intercellular Adhesion Molecules (ICAMs). (a) Superposition 
of the lowest energy solution structure of the SARS CoV X4 ectodomain (in red) with the Dl domains of Intercellular Adhesion 
Molecule 1 (ICAM-1, in green) and Intercellular Adhesion Molecule 2 (ICAM-2?, in blue). Shown is a backbone trace of the mole- 
cules, the structural superposition was obtained in an iterative fitting procedure with a cut-off distance of 0.5 nm and corresponds 
to the sequence alignment shown in Figure 4. (b—d) Ribbon representation with accentuated secondary structural elements of X4 
ectodomain (b), ICAM-1 (c) and ICAM-2 (d). The orientation and colour of all molecules is the same as in panel (a). These fig- 
ures were prepared using MOLMOL [18] 
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Figure 6. Structure based sequence alignment of the SARS CoV X4 ectodomain to the D1 domain of Intercellular Adhesion Mole- 
cules (ICAMs). Shown is a sequence alignment of the SARS CoV X4 ectodomain (top) with the D1 domains of Intercellular 
Adhesion Molecule 2 (ICAM-2, middle) and Intercellular Adhesion Molecule 1 (CAM-1, bottom). Secondary structural elements 
and sequence numbering of the X4 ectodomain is shown above the sequences, secondary structural elements of ICAM-2 and se- 
quence numbering of ICAM-1 is shown below the sequences. The sequence alignment is based on spatial proximity (cut-off dis- 
tance 0.5 nm) in the structural superposition of all three structures. The alignment was prepared using UCSF Chimera [3l]land 


ALSCRIPT [32]. 


p-strand A, which is splitted between the f-sheets 
in the ICAMs, but is completely aligned to the 
CFG sheet in X4. Another striking difference is the 
somehow different location of the disulfide bonds 
in both ICAMs and X4e. 

The structure-based sequence alignment of X4 
with ICAM-1 and ICAM-2 D1 domains shows 
little sequence identities among the proteins 
(Figure 6). However, the key residue for LFA-1 
interaction of ICAM-1 and ICAM-2, Glu34 and 
Glu37, respectively, is present in X4 at the homo- 
log or analog sequence position (Glu26), which is 
the last residue of f-strand C. This glutamic acid 


residue in the ICAM-1 DI domain forms a direct 
coordination to the Mg*t ion of the metal-ion 
dependent association site (MIDAS) in the LFA-1 
I-domain [30]. A further characteristic feature of 
the I-domain binding site in ICAM-1 is a ring of 
hydrophobic residues around this glutamate resi- 
due (Pro36, Tyr66, Met64, and the aliphatic 
portions of Gln62 and Gln73) [30]. This feature 
can be also found in X4e, where Glu26 is sur- 
rounded by a ring of hydrophobic residues, too. 
(Figure 2c). 

To our knowledge, besides the a, subunit of 
LFA-1, no other cellular binding partner for the 
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D1 domain of ICAM-1 1s known so far. I-domains 
of ay integrin subunits are known to bind ICAM- 
1, but via its D3 domain [33]. Weak binding 
capabilities for ICAM-1 are further reported for 
ax subunits [34], but presumably not via its Dl 
domain [35]. For wp integrin subunits it was shown 
that they do not bind ICAM-1 [36]. 

The relevance of the very obvious structural 
similarity of X4e with ICAM-1 and ICAM-2 Dl 
domains is hard to estimate. But based on this 
structural similarity and the described common 
features of amino acid sequence and _ surface 
appearance of X4 with the well characterized oy 
integrin I domain binding site on ICAM-1 D1, we 
suggest that X4 contains a binding site for the oy 
integrin subunit I-domain of LFA-1. Although, 
experimental data will be needed to confirm the 
prediction, we carried out a modelling study on X4 
and the a, integrin subunit I-domain of LFA-1. 
Interestingly, the resulting complex did not show 
obvious steric problems to be formed (Figure 7). 


Hypothetic consequences of the proposed LFA-1 
binding activity of X4 


The consequences of a predicted LFA-1 binding 
activity of X4 depend largely on the subcellular 
localization of X4 in infected cells or virus 
particles. In the following, we speculate on 
potential functions of X4 as a LFA-1 binding 
protein dependent from X4 subcellular localiza- 
tion based on examples from other proteins with 
known functions. 

The presence of LFA-1 binding X4 molecules 
on the virus surface would allow the virus to use 
LFA-1 as a receptor for cell entry. Such an 
example is known from HIV-1, of which virus 
particles were found bearing incorporated host- 
encoded ICAM-1 on their surface, which leads to a 
5- to 10-fold increase in infectivity, caused by an 
interaction between virally incorporated ICAM-1 
and cell surface LFA-1 [37]. So far, however, there 
are no reports on positive detection of X4 in virus 
particles. 

X4 was already described to be primarily 
located in the ER of infected cells and to contain 
an ER retention signal [9]. If X4 1s able to bind 
LFA-1 it could prevent delivery of newly synthe- 
sized LFA-1 molecules from the ER to the cell 
surface. Prominent examples of viral accessory 
proteins with such functions are known from other 


viruses, e.g. HIV-1 Vpu binds to CD4 and prevents 
CD4 delivery to the cell surface and induces even 
its degradation [38, 39]. LFA-1 is exclusively 
expressed on the surface of leukocytes including 
T cells and dendritic cells. It mediates several 
adhesive interactions between cells of the immune 
system, e.g. dendritic cells and T cells, B cells and 
T cells, T cells and their target cells, as well as the 
interactions of leukocytes with the endothelium 
and the transendothelial migration of leukocytes 
[40]. Loss of LFA-1 leads to severe defects of the 
immune system as can be seen in the leukocyte 
adhesion deficiency (LAD) syndrome [41]. 

One study reported small amounts of X4 on the 
surface of infected cells [7]. The presence of X4 
with LFA-1 binding activity on the surface of 
infected cells could for example interfere with T 
cell homing, or increase the infected cells’ affinity 
for leukocytes, and could even induce apoptosis in 
LFA-1 presenting T cells. Leukotoxin from Acti- 
nobacillus actinomycetemcomitans 1s an example of 
a protein that is expressed on the surface of 
infected cells, binds to LFA-1 on T cells, and 
induces apoptosis via caspase-3 dependent path- 
ways in these cells [42]. Indeed, overexpression of 
X4 in Vero E6 cells induces apoptosis via a 
caspase-3 dependent pathway [12]. Whether 
LFA-1 is involved in the mechanism, however, is 
not known. 

Lymphopenia is a common observation among 
SARS patients [43]. CD4~ cells are more affected 
than CD8" cells. The reason for the lymphocyte 
depletion is not known. Interestingly, AIDS 
patients are characterized by a CD4” depletion. 
It is suggested that the disappearance of CD4* T 
cells in the blood is the result of increased 
migration of CD4” cells from the blood into 
tissues. Secondary signals through homing recep- 
tors received during the homing process induce 
many of these cells into apoptosis [44]. 


Comparison of X4e with IL-1 receptor D1 domain 


A second site suitable for a protein—protein- 
interaction 1s suggested by the X4e similarity to 
the second best hit for structural similarity from 
VAST and DALI searches, namely the domain DI 
of the IL-1 receptor. Dl and D2 domains of the IL- 
1 receptor are of the Ig-like fold with a switched 
strand A, like X4, but in contrast to X4 contain a 
short strand C’. The similarity between X4e and 
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Figure 7. Modelling study of SARS CoV X4 ectodomain and the the ap integrin subunit I-domain of LFA-1. X4 is colored in 
blue, the integrin I domain orange, and the Mg*~ ion of the metal ion-dependent adhesion site (MIDAS) is colored magenta. The 
I-domain MIDAS residues and Glu26 of X4 are shown as ball-and-stick models. The model was constructed based on the coordi- 
nates of the ICAM-1 complex with the intermediate affinity I-domain of af, [30] (PDB code: 1MQ8). SARS CoV X4 protein was 
superimposed to ICAM-1 based on protein backbone coordinates of structurally aligning residues of the potential binding inter- 
face, comprising f-strands C, D and F, and the CD loop. Sidechain torsion angles of Glu26 were adjusted for steric reasons. The 


model was prepared using UCSF Chimera [31]. 


D1 domain of IL-1 receptor is largely based on the 
switched strand A and the resulting low rmsd 
value for the majority of C ,-coordinates of both 
protein domains. The sequences of both domains 


did not reveal the slightest similarity. An interest- 
ing feature of the IL-1 receptor DI domain, 
however, is the ridge between strands A and B 
forming the binding site for interleukin-1. Such a 
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ridge can be found in X4e, as well, and was already 
described for the X-ray structure of X4e [7]. 

Taken all data together, the ectodomain of the 
SARS CoV coded X4 protein adopts an Ig-like 
fold, that shows some features resembling those of 
the DI domain of ICAM-1. Based thereon, we 
predict X4 to have a LFA-1 binding activity. 
Several observation from SARS patients correlate 
well with such an activity coded in the SARS CoV 
genome. In any case, an investigation of the page 
proposed binding potential of X4 to bind LFA-1, 
may help to elucidate the role of X4 in replication 
and pathogenesis of SARS CoV. 
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